Full Report

Climate Change: Visualization for the global warming with world temperatures data and Greenhouse Gas Emission Report data in Canada

Group Project DATA 601: Fall 2020

Group 9 Shahbaz Masih, Yu Nakamura, Shade Oguntoyinbo

Agenda:

  • 1. Introduction
  • 2. Dataset
  • 3. Guiding Questions
  • 4. Tasks
  • 5. Conclusion
In [50]:
import pandas as pd
import geopandas as gpd
import numpy as np
import datetime as dt

import matplotlib as mpl
import matplotlib.pyplot as plt
import plotly as plotly
import plotly.express as px
import plotly.offline as py
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
import seaborn as sns

py.init_notebook_mode(connected=True)
mpl.style.use('ggplot')

Introduction

  • Climate Change is a globally acclaimed discussion with different countries assuming responsibilities on varying scale towards a more sustainable future. Some have pledged to reduce their carbon footprint or greenhouse gases (GHG) emission by certain percentages over the foreseeable future. Others have chosen to believe this is a hoax.
  • The current warming trend is of particular significance because most of it is extremely likely (greater than 95 percent probability) to be the result of human activity since the mid-20th century and proceeding at a rate that is unprecedented over decades to millennia. Global Climate Change Vital Signs of the Planet, 2020
  • As a country, Canada has pledged to meeting our Paris Agreement GHG emission reduction target of 30% below 2005 levels by 2030. Greenhouse Gas sources and sinks: executive summary 2020
  • We are investigating this phenomenon to see what trends exist and how temperatures have changed by countries over several years to further understand the effects of climate change.

Data Sets

  • Dataset 1: Climate Change: Earth Surface Temperature Data
    Souce: Kaggle

    • Features world monthly measurement data collection of land surface temperature from the 1750 to 2013
    • The dataset is licensed under open data source by Kaggle which allow us to share, adapt and publish its information for non-commercial purpose.
    • File format: csv, Structure, tabular form: 4 columns, 577,463 rows (GlobalLandTemperaturesByCountry.csv), 5 columns, 645,676 rows (GlobalLandTemperaturesByState.csv), 7 columns, 239,000 rows (GlobalLandTemperaturesByMajorCity.csv)
    • Last updated 2017
    • 243 unique countries (GlobalLandTemperaturesByCountry.csv), 241 unique states and 7 countries (GlobalLandTemperaturesByState.csv), 49 unique countries and 100 cities (GlobalLandTemperaturesByMajorCity.csv)
  • Dataset 2: Greenhouse Gas Reporting Program (GHGRP)
    Souce: GHGRP

    • Features yearly reported data of GHG emission measurement at each facility in Canada from 2004 to 2018
    • The dataset is an open government license which enable us to copy, modify, publish, translate, adapt, distribute, or otherwise use the information in any medium, mode or format for any lawful purpose.
    • File format: csv, Structure, tabular form: 77 columns, 9,606 rows
    • Last updated 2018 Oct
    • 2502 unique Facility Names, 1990 unique Facility Location, 13 unique Facility Provinces
In [51]:
# Read "global temperature by State" csv file
GlobalTempState = pd.read_csv("./GlobalLandTemperaturesByState.csv")
display(GlobalTempState.head())

# Read "GHG emissino data in Canada" csv file
GHG = pd.read_csv("./PDGES-GHGRP-GHGEmissionsGES-2004-Present.csv",  encoding = "ISO-8859-1", engine='python')
# Change columns name:
GHG = GHG[["Reference Year / Année de référence" , "Facility Name / Nom de l'installation", "Facility City or District or Municipality / Ville ou District ou Municipalité de l'installation", "Facility Province or Territory / Province ou territoire de l'installation", "Latitude", "Longitude", "Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)", "English Facility NAICS Code Description / Description du code SCIAN de l'installation en anglais"]]    
GHG = GHG.rename({"Reference Year / Année de référence": "YEAR", "Facility Name / Nom de l'installation":"FacilityName", "Facility Province or Territory / Province ou territoire de l'installation":"State", "Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)":"TotalEmission", "Facility City or District or Municipality / Ville ou District ou Municipalité de l'installation":"FacilityCity", "English Facility NAICS Code Description / Description du code SCIAN de l'installation en anglais":"FacilityCode"}, axis='columns')    
display(GHG.head())

#Read "Global Temperature" csv file
GLT = pd.read_csv('GlobalTemperatures.csv')
display(GLT.head())

#Read "Global Land Temperatures By Country" csv file
LTbyC = pd.read_csv('GlobalLandTemperaturesByCountry.csv')
display(LTbyC.head())

#Read "country_codes" csv file
ctry = pd.read_csv('country_codes.csv') 
ctry.Country = ctry.Country.str.strip() 
ctry = ctry.replace(['Russian Federation'],'Russia')
display(ctry.head())
dt AverageTemperature AverageTemperatureUncertainty State Country
0 1855-05-01 25.544 1.171 Acre Brazil
1 1855-06-01 24.228 1.103 Acre Brazil
2 1855-07-01 24.371 1.044 Acre Brazil
3 1855-08-01 25.427 1.073 Acre Brazil
4 1855-09-01 25.675 1.014 Acre Brazil
YEAR FacilityName FacilityCity State Latitude Longitude TotalEmission FacilityCode
0 2018 Division Alma Alma Quebec 48.56500 -71.65556 9.522334e+04 Mechanical pulp mills
1 2018 Foothills Pipeline, Alberta Airdrie Alberta NaN NaN 3.823373e+05 Pipeline transportation of natural gas
2 2018 Kingston CoGen Bath Ontario 44.20950 -76.72460 1.195507e+03 Fossil-fuel electric power generation
3 2018 Redwater Fertilizer Operations Sturgeon County Alberta 53.84200 -113.09300 1.288410e+06 Chemical fertilizer (except potash) manufacturing
4 2018 Alberta Envirofuels Edmonton Alberta 53.53199 -113.36492 2.976423e+05 Other basic organic chemical manufacturing
dt LandAverageTemperature LandAverageTemperatureUncertainty LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature LandMinTemperatureUncertainty LandAndOceanAverageTemperature LandAndOceanAverageTemperatureUncertainty
0 1750-01-01 3.034 3.574 NaN NaN NaN NaN NaN NaN
1 1750-02-01 3.083 3.702 NaN NaN NaN NaN NaN NaN
2 1750-03-01 5.626 3.076 NaN NaN NaN NaN NaN NaN
3 1750-04-01 8.490 2.451 NaN NaN NaN NaN NaN NaN
4 1750-05-01 11.573 2.072 NaN NaN NaN NaN NaN NaN
dt AverageTemperature AverageTemperatureUncertainty Country
0 1743-11-01 4.384 2.294 Ã…land
1 1743-12-01 NaN NaN Ã…land
2 1744-01-01 NaN NaN Ã…land
3 1744-02-01 NaN NaN Ã…land
4 1744-03-01 NaN NaN Ã…land
Code Country
0 ABW Aruba
1 AFG Afghanistan
2 AGO Angola
3 AIA Anguilla
4 ALA Ã…land Islands

Guiding Questions

  1. The first question to address will be if the data contains any outliers and missing values which will require preliminary data exploration (Global temperature data has been questioned for its integrity in the past).
  2. Is there any correlation between major cities (the main center of activities and industries) temperature rise and corresponding state and countries and global temperature rise?
  3. How has average land surface temperature changed from 1750 to 2013? When dramatic temperature increase was observed?
    • Can we confirm the global warning from this dataset?
  4. Which countries faced the significant temperature change year over year? Any trends by countries? How about Canada?
  5. How Greenhouse Gas (GHG) emission measured at each facilities station decrease or increased in Canada over years?
    • Any difference between province. If so, what is the main factor of its difference?
  6. Any relationship between GHG emission and average surface temperatures in Canada?

Tasks and Contributions

  • Each number of task below corrensponds to above number of Guiding Question.
  • Shahbaz Masih conducted data wrangling and created visualizations for Guiding Question 1 and 2
  • Shade Oguntoyinbo conducted data wrangling and created visualizations for Guiding Question 3 and 4
  • Yu Nakamura conducted data wrangling and created visualizations for Guiding Question 5 and 6

Task : Guiding Question 1

  • To address question guiding question (a), data will be looked at for basic statistics functions like minimum, maximum, quartile, standard deviation, and mean values of each variable to have a general understanding of range of data values and data spread (from standard deviation). Box plots will be plotted to see data distribution and to identify outliers. The outliers from the data will be removed by using inter-quartile range (IQR) score method. Pandas and matplotlib will be used in this task.
In [52]:
GlobalTemperatures = pd.read_csv("GlobalTemperatures.csv")
display(GlobalTemperatures)
GlobalTemperatures.describe()
#checking nan values
print("Number of NaN values for the column dt :", GlobalTemperatures['dt'].isnull().sum())
print("Number of NaN values for the column LandAverageTemperature :", GlobalTemperatures['LandAverageTemperature'].isnull().sum())
print("Number of NaN values for the column LandMaxTemperature :", GlobalTemperatures['LandMaxTemperature'].isnull().sum())
print("Number of NaN values for the column LandMinTemperature :", GlobalTemperatures['LandMinTemperature'].isnull().sum())
print("Number of NaN values for the column LandAndOceanAverageTemperature :", GlobalTemperatures['LandAndOceanAverageTemperature'].isnull().sum())
#removing nan values
GlobalTemperatures = GlobalTemperatures[GlobalTemperatures['LandAverageTemperature'].notna()]
GlobalTemperatures.rename(columns={'LandAverageTemperature': 'GlobalAverageTemperature'}, inplace=True)
GlobalTemperatures.head()
#Reading country tempertature data and checking statistics
GlobalLandTemperaturesByCountry = pd.read_csv("GlobalLandTemperaturesByCountry.csv")
display(GlobalLandTemperaturesByCountry)
GlobalLandTemperaturesByCountry.describe()
#checking nan values
print("Number of NaN values for the column Average Temperature :", GlobalLandTemperaturesByCountry['AverageTemperature'].isnull().sum())
print("Number of NaN values for the column Average Temperature Uncertainty :", GlobalLandTemperaturesByCountry['AverageTemperatureUncertainty'].isnull().sum())
#removing nan values
GlobalLandTemperaturesByCountry = GlobalLandTemperaturesByCountry[GlobalLandTemperaturesByCountry['AverageTemperature'].notna()]
GlobalLandTemperaturesByCountry.rename(columns={'AverageTemperature': 'CountryAverageTemperature'}, inplace=True)
GlobalLandTemperaturesByCountry.head()
#Reading state tempertature data and checking statistics
GlobalLandTemperaturesByState = pd.read_csv("GlobalLandTemperaturesByState.csv")
display(GlobalLandTemperaturesByState)
GlobalLandTemperaturesByState.describe()
#checking nan values
print("Number of NaN values for the column Average Temperature :", GlobalLandTemperaturesByState['AverageTemperature'].isnull().sum())
print("Number of NaN values for the column Average Temperature Uncertainty :", GlobalLandTemperaturesByState['AverageTemperatureUncertainty'].isnull().sum())
#removing nan values
GlobalLandTemperaturesByState = GlobalLandTemperaturesByState[GlobalLandTemperaturesByState['AverageTemperature'].notna()]
GlobalLandTemperaturesByState.rename(columns={'AverageTemperature': 'StateAverageTemperature'}, inplace=True)
GlobalLandTemperaturesByState.head()
#Reading major city tempertature data and checking statistics
GlobalLandTemperaturesByMajorCity = pd.read_csv("GlobalLandTemperaturesByMajorCity.csv")
display(GlobalLandTemperaturesByMajorCity)
GlobalLandTemperaturesByMajorCity.describe()
#checking nan values
print("Number of NaN values for the column Average Temperature :", GlobalLandTemperaturesByMajorCity['AverageTemperature'].isnull().sum())
print("Number of NaN values for the column Average Temperature Uncertainty :", GlobalLandTemperaturesByMajorCity['AverageTemperatureUncertainty'].isnull().sum())
#removing nan values
GlobalLandTemperaturesByMajorCity = GlobalLandTemperaturesByMajorCity[GlobalLandTemperaturesByMajorCity['AverageTemperature'].notna()]
GlobalLandTemperaturesByMajorCity.shape
GlobalLandTemperaturesByMajorCity.rename(columns={'AverageTemperature':'CityAverageTemperature'}, inplace=True)
GlobalLandTemperaturesByMajorCity.head()
#reading GHG data
CanadaGHGData = pd.read_csv("PDGES-GHGRP-GHGEmissionsGES-2004-Present.csv", encoding = "ISO-8859-1", engine='python')
display(CanadaGHGData)
CanadaGHGData.describe()
#grouping the data by year and summing up the total emissions
YearlyCanadaGHGData =CanadaGHGData[['Reference Year / Année de référence', 'Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)']]
YearlyCanadaGHGData.head()
YearlyCanadaGHGData = YearlyCanadaGHGData.groupby(['Reference Year / Année de référence'], as_index=False)['Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)'].sum()
YearlyCanadaGHGData.head()
dt LandAverageTemperature LandAverageTemperatureUncertainty LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature LandMinTemperatureUncertainty LandAndOceanAverageTemperature LandAndOceanAverageTemperatureUncertainty
0 1750-01-01 3.034 3.574 NaN NaN NaN NaN NaN NaN
1 1750-02-01 3.083 3.702 NaN NaN NaN NaN NaN NaN
2 1750-03-01 5.626 3.076 NaN NaN NaN NaN NaN NaN
3 1750-04-01 8.490 2.451 NaN NaN NaN NaN NaN NaN
4 1750-05-01 11.573 2.072 NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ...
3187 2015-08-01 14.755 0.072 20.699 0.110 9.005 0.170 17.589 0.057
3188 2015-09-01 12.999 0.079 18.845 0.088 7.199 0.229 17.049 0.058
3189 2015-10-01 10.801 0.102 16.450 0.059 5.232 0.115 16.290 0.062
3190 2015-11-01 7.433 0.119 12.892 0.093 2.157 0.106 15.252 0.063
3191 2015-12-01 5.518 0.100 10.725 0.154 0.287 0.099 14.774 0.062

3192 rows × 9 columns

Number of NaN values for the column dt : 0
Number of NaN values for the column LandAverageTemperature : 12
Number of NaN values for the column LandMaxTemperature : 1200
Number of NaN values for the column LandMinTemperature : 1200
Number of NaN values for the column LandAndOceanAverageTemperature : 1200
dt AverageTemperature AverageTemperatureUncertainty Country
0 1743-11-01 4.384 2.294 Ã…land
1 1743-12-01 NaN NaN Ã…land
2 1744-01-01 NaN NaN Ã…land
3 1744-02-01 NaN NaN Ã…land
4 1744-03-01 NaN NaN Ã…land
... ... ... ... ...
577457 2013-05-01 19.059 1.022 Zimbabwe
577458 2013-06-01 17.613 0.473 Zimbabwe
577459 2013-07-01 17.000 0.453 Zimbabwe
577460 2013-08-01 19.759 0.717 Zimbabwe
577461 2013-09-01 NaN NaN Zimbabwe

577462 rows × 4 columns

Number of NaN values for the column Average Temperature : 32651
Number of NaN values for the column Average Temperature Uncertainty : 31912
dt AverageTemperature AverageTemperatureUncertainty State Country
0 1855-05-01 25.544 1.171 Acre Brazil
1 1855-06-01 24.228 1.103 Acre Brazil
2 1855-07-01 24.371 1.044 Acre Brazil
3 1855-08-01 25.427 1.073 Acre Brazil
4 1855-09-01 25.675 1.014 Acre Brazil
... ... ... ... ... ...
645670 2013-05-01 21.634 0.578 Zhejiang China
645671 2013-06-01 24.679 0.596 Zhejiang China
645672 2013-07-01 29.272 1.340 Zhejiang China
645673 2013-08-01 29.202 0.869 Zhejiang China
645674 2013-09-01 NaN NaN Zhejiang China

645675 rows × 5 columns

Number of NaN values for the column Average Temperature : 25648
Number of NaN values for the column Average Temperature Uncertainty : 25648
dt AverageTemperature AverageTemperatureUncertainty City Country Latitude Longitude
0 1849-01-01 26.704 1.435 Abidjan Côte D'Ivoire 5.63N 3.23W
1 1849-02-01 27.434 1.362 Abidjan Côte D'Ivoire 5.63N 3.23W
2 1849-03-01 28.101 1.612 Abidjan Côte D'Ivoire 5.63N 3.23W
3 1849-04-01 26.140 1.387 Abidjan Côte D'Ivoire 5.63N 3.23W
4 1849-05-01 25.427 1.200 Abidjan Côte D'Ivoire 5.63N 3.23W
... ... ... ... ... ... ... ...
239172 2013-05-01 18.979 0.807 Xian China 34.56N 108.97E
239173 2013-06-01 23.522 0.647 Xian China 34.56N 108.97E
239174 2013-07-01 25.251 1.042 Xian China 34.56N 108.97E
239175 2013-08-01 24.528 0.840 Xian China 34.56N 108.97E
239176 2013-09-01 NaN NaN Xian China 34.56N 108.97E

239177 rows × 7 columns

Number of NaN values for the column Average Temperature : 11002
Number of NaN values for the column Average Temperature Uncertainty : 11002
GHG ID No. / No d'identification de GES Reference Year / Année de référence Facility Name / Nom de l'installation Facility Location / Emplacement de l'installation Facility City or District or Municipality / Ville ou District ou Municipalité de l'installation Facility Province or Territory / Province ou territoire de l'installation Facility Postal Code / Code postal de l'installation Latitude Longitude Facility NPRI ID / Numéro d'identification de l'INRP ... C4F8 (tonnes) C4F8 (tonnes CO2e / tonnes éq. CO2) C5F12 (tonnes) C5F12 (tonnes CO2e / tonnes éq. CO2) C6F14 (tonnes) C6F14 (tonnes CO2e / tonnes éq. CO2) PFC Total (tonnes CO2e / tonnes éq. CO2) SF6 (tonnes) SF6 (tonnes CO2e / tonnes éq. CO2) Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)
0 G10001 2018 Division Alma 1100 Melanion Street Alma Quebec G8B 5W2 48.56500 -71.65556 983.0 ... 0.0 0.0 0 0 0 0 0.0 NaN NaN 9.522334e+04
1 G10003 2018 Foothills Pipeline, Alberta NaN Airdrie Alberta T4A 2G7 NaN NaN NaN ... 0.0 0.0 0 0 0 0 0.0 NaN NaN 3.823373e+05
2 G10004 2018 Kingston CoGen 5146 Taylor-Kidd Boulevard Bath Ontario K0H 1G0 44.20950 -76.72460 5765.0 ... 0.0 0.0 0 0 0 0 0.0 NaN NaN 1.195507e+03
3 G10006 2018 Redwater Fertilizer Operations 56225 SH643 Sturgeon County Alberta T0A 2W0 53.84200 -113.09300 2134.0 ... 0.0 0.0 0 0 0 0 0.0 NaN NaN 1.288410e+06
4 G10007 2018 Alberta Envirofuels 9511 17th Street Edmonton Alberta T6P 1Y3 53.53199 -113.36492 3974.0 ... 0.0 0.0 0 0 0 0 0.0 NaN NaN 2.976423e+05
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9600 G10323 2004 Windsor Essex Cogeneration Plant 2600 Temple Drive Windsor Ontario N8W 5J5 42.28770 -82.98030 5267.0 ... 0.0 0.0 0 0 0 0 0.0 0.0 0.0 1.787137e+05
9601 G10324 2004 Wolf Lake/Primrose Thermal Operation 10-08-066-05W4 Bonnyville Alberta T9N2J7 NaN NaN 4136.0 ... 0.0 0.0 0 0 0 0 0.0 0.0 0.0 1.455941e+06
9602 G10325 2004 Works 84, Owen Sound Flat Glass Plant Lots 6,7,8 of Range 7, East of the Garafraxa Road Owen Sound Ontario N4K2C3 NaN NaN 4861.0 ... 0.0 0.0 0 0 0 0 0.0 0.0 0.0 1.037091e+05
9603 G10326 2004 Zama Gas Plant: 1, 2, 3 13-12-116-6 W6M Zama City Alberta T0H4E0 NaN NaN 5285.0 ... 0.0 0.0 0 0 0 0 0.0 0.0 0.0 1.381902e+05
9604 G11730 2004 Usine Vaudreuil 1955, boul Mellon Édifice 109 Jonquière Quebec G7S4L2 NaN NaN 2978.0 ... 0.0 0.0 0 0 0 0 0.0 0.0 0.0 7.038797e+05

9605 rows × 77 columns

Out[52]:
Reference Year / Année de référence Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)
0 2004 2.783856e+08
1 2005 2.779966e+08
2 2006 2.719402e+08
3 2007 2.773294e+08
4 2008 2.722536e+08
In [53]:
#checking outliers by box plots
ax1 = sns.boxplot(data=GlobalTemperatures)
ax1.set_xticklabels(ax1.get_xticklabels(),rotation=90,fontsize=12)
ax1.set_ylabel('Global Tempertures',fontsize=16)
plt.title('Global Temperatures Box Plots',fontsize=20)
plt.show
Out[53]:
<function matplotlib.pyplot.show(*args, **kw)>
In [54]:
#checking outliers by box plots
ax2 = sns.boxplot(data=GlobalLandTemperaturesByCountry)
ax2.set_xticklabels(ax1.get_xticklabels(),rotation=90,fontsize=12)
ax2.set_ylabel('Tempertures',fontsize=16)
plt.title('Global Temperatures By Country Box Plots',fontsize=20)
plt.show
Out[54]:
<function matplotlib.pyplot.show(*args, **kw)>
In [7]:
#checking outliers by box plots
ax3 = sns.boxplot(data=GlobalLandTemperaturesByState)
ax3.set_xticklabels(ax1.get_xticklabels(),rotation=90,fontsize=12)
ax3.set_ylabel('Tempertures',fontsize=16)
plt.title('Global Temperatures By State Box Plots',fontsize=20)
Out[7]:
Text(0.5, 1.0, 'Global Temperatures By State Box Plots')
In [8]:
#checking outliers by box plots
ax4 = sns.boxplot(data=GlobalLandTemperaturesByMajorCity)
ax4.set_xticklabels(ax1.get_xticklabels(),rotation=90,fontsize=12)
ax4.set_ylabel('Tempertures',fontsize=16)
plt.title('Global Temperatures By Major City Box Plots',fontsize=20)
Out[8]:
Text(0.5, 1.0, 'Global Temperatures By Major City Box Plots')
In [9]:
ax5 = sns.boxplot(data=CanadaGHGData['Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)'])
ax5.set(ylabel='Total Emissions')
#checking outliers by box plots
#ax5 = sns.boxplot(data=CanadaGHGData['Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)'])
#ax5.set_xticklabels(ax1.get_xticklabels(),rotation=90,fontsize=12)
#ax5.set_ylabel('Total Emissions',fontsize=16)
plt.title('Total GHG Emissions Box Plot',fontsize=20)
Out[9]:
Text(0.5, 1.0, 'Total GHG Emissions Box Plot')
In [10]:
sns.distplot(CanadaGHGData['Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)'], kde=False, color='blue', bins=100)
plt.title('Total Emissions', fontsize=18)
plt.xlabel('Total Emissions (tonnes CO2e)', fontsize=16)
plt.ylabel('Frequency', fontsize=16)
Out[10]:
Text(0, 0.5, 'Frequency')
In [11]:
#IQR score method to remove the outliers
GHGData = CanadaGHGData[['Reference Year / Année de référence', 'Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)']] 
GHGData.head()
Q1 = GHGData.quantile(0.25)
Q3 = GHGData.quantile(0.75)
IQR = Q3 - Q1
print(IQR)
df2 = GHGData[~((GHGData < (Q1 - 1.5 * IQR)) |(GHGData > (Q3 + 1.5 * IQR))).any(axis=1)]
df2.shape
Reference Year / Année de référence                                        7.0000
Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)    274351.5394
dtype: float64
Out[11]:
(8368, 2)
In [12]:
#Box plot after removing outliers
ax6 = sns.boxplot(data=df2['Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)'])
ax6.set(ylabel='Total Emissions')
plt.title('Total GHG Emissions Box Plot After outliers removal',fontsize=20)
Out[12]:
Text(0.5, 1.0, 'Total GHG Emissions Box Plot After outliers removal')

Task : Guiding Question 2

  • A correlation matrix is build to see if there is any any correlation between major cities (the main center of activities and industries) temperature rise and corresponding state and countries and global temperature rise. Both absolute and differential values will be looked at to see any correlations.
  • Line plots were also looked at but did not answer the guiding question properly
  • Another dataset (cities with states) was used to combine cities data set with states, countries and global data set.
  • Datasets were merged together to combine them altogether matching keys and to suit the purpose
In [13]:
#https://raw.githubusercontent.com/datasets/world-cities/master/data/world-cities.csv
citiesdata = pd.read_csv("cities.csv")
citiesdata.head()
#subcountry was renamed to stat
citiesdata.rename(columns={'name': 'City', 'subcountry': 'State'}, inplace=True)
citiesdata.head()
citieswithstates = citiesdata[['City', 'State']]
citieswithstates.head()
#cities and states temperatures data were merged together
GlobalLandTemperaturesByMajorCitywithstate = pd.merge(GlobalLandTemperaturesByMajorCity, citieswithstates, on='City')
GlobalLandTemperaturesByMajorCitywithstate.head()
citiesandstates = GlobalLandTemperaturesByMajorCitywithstate.merge(GlobalLandTemperaturesByState, on=['dt', 'State', 'Country'])
citiesandstates.head()
#cities and states data was merged with countries data
citiestatesandcountries = citiesandstates.merge(GlobalLandTemperaturesByCountry, on=['dt', 'Country'])
citiestatesandcountries.head()
#cities, states and countries data was merged with global data 
citiestatescountriesandglobe = GlobalTemperatures.merge(citiestatesandcountries, on=['dt'])
citiestatescountriesandglobe.head()
#checking if there are any null values
citiestatescountriesandglobe.info(verbose=True, null_counts=True)
#calculating delta Temperature for city, state, country and globe
citiestatescountriesandglobe= citiestatescountriesandglobe.sort_values(by =['City', 'dt'] )

citiestatescountriesandglobe['GlobalDeltaT'] = citiestatescountriesandglobe.groupby('City')['GlobalAverageTemperature'].diff(-1) * (-1)
citiestatescountriesandglobe['StateDeltaT'] = citiestatescountriesandglobe.groupby('City')['StateAverageTemperature'].diff(-1) * (-1)
citiestatescountriesandglobe['CountryDeltaT'] = citiestatescountriesandglobe.groupby('City')['CountryAverageTemperature'].diff(-1) * (-1)
citiestatescountriesandglobe['CityDeltaT'] = citiestatescountriesandglobe.groupby('City')['CityAverageTemperature'].diff(-1) * (-1)
citiestatescountriesandglobe.head()
#separating absolute values of temperature for city, state, country and globe
temperaturedata = citiestatescountriesandglobe[['dt', 'City', 'GlobalAverageTemperature', 'CountryAverageTemperature', 
                                                'StateAverageTemperature', 'CityAverageTemperature' ]]
temperaturedata.head()
#separating delta temperatures for city, state, country and globe 
temperaturechangedata = citiestatescountriesandglobe[['dt', 'City','GlobalDeltaT', 'CountryDeltaT', 
                                                'StateDeltaT', 'CityDeltaT' ]]
temperaturechangedata.head()
#Chicago temperature data
ChicagoTempData= temperaturedata[temperaturedata['City'] == 'Chicago']
ChicagoTempChangeData= temperaturechangedata[temperaturedata['City'] == 'Chicago']

#name of cities in the final dataset
citiestatescountriesandglobe['City'].unique()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39723 entries, 0 to 39722
Data columns (total 20 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   dt                                         39723 non-null  object 
 1   GlobalAverageTemperature                   39723 non-null  float64
 2   LandAverageTemperatureUncertainty          39723 non-null  float64
 3   LandMaxTemperature                         34958 non-null  float64
 4   LandMaxTemperatureUncertainty              34958 non-null  float64
 5   LandMinTemperature                         34958 non-null  float64
 6   LandMinTemperatureUncertainty              34958 non-null  float64
 7   LandAndOceanAverageTemperature             34958 non-null  float64
 8   LandAndOceanAverageTemperatureUncertainty  34958 non-null  float64
 9   CityAverageTemperature                     39723 non-null  float64
 10  AverageTemperatureUncertainty_x            39723 non-null  float64
 11  City                                       39723 non-null  object 
 12  Country                                    39723 non-null  object 
 13  Latitude                                   39723 non-null  object 
 14  Longitude                                  39723 non-null  object 
 15  State                                      39723 non-null  object 
 16  StateAverageTemperature                    39723 non-null  float64
 17  AverageTemperatureUncertainty_y            39723 non-null  float64
 18  CountryAverageTemperature                  39723 non-null  float64
 19  AverageTemperatureUncertainty              39723 non-null  float64
dtypes: float64(14), object(6)
memory usage: 6.4+ MB
Out[13]:
array(['Belo Horizonte', 'Chengdu', 'Chicago', 'Dalian', 'Guangzhou',
       'Jaipur', 'Kanpur', 'Los Angeles', 'Melbourne', 'Nagpur',
       'Nanjing', 'Pune', 'Salvador', 'Shenyang', 'Sydney', 'Tangshan',
       'Toronto', 'Wuhan'], dtype=object)
In [14]:
fig, ax = plt.subplots(1,2, figsize=(13,7))
ChicagoTempData.plot(x="dt", y=['GlobalAverageTemperature', 'CountryAverageTemperature', 
                                                'StateAverageTemperature', 'CityAverageTemperature' ], ax=ax[0])
ChicagoTempChangeData.plot(x="dt", y=['GlobalDeltaT', 'CountryDeltaT', 
                                                'StateDeltaT', 'CityDeltaT'], ax=ax[1])
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2bd71e9d30>
In [15]:
temperaturedata.corr()
Out[15]:
GlobalAverageTemperature CountryAverageTemperature StateAverageTemperature CityAverageTemperature
GlobalAverageTemperature 1.000000 0.477616 0.497273 0.536401
CountryAverageTemperature 0.477616 1.000000 0.908425 0.876018
StateAverageTemperature 0.497273 0.908425 1.000000 0.983177
CityAverageTemperature 0.536401 0.876018 0.983177 1.000000
In [16]:
sns.heatmap(temperaturedata.corr(), annot = True)
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2bd547cdf0>
In [17]:
temperaturechangedata.corr()
Out[17]:
GlobalDeltaT CountryDeltaT StateDeltaT CityDeltaT
GlobalDeltaT 1.000000 0.714798 0.658119 0.674521
CountryDeltaT 0.714798 1.000000 0.942465 0.921064
StateDeltaT 0.658119 0.942465 1.000000 0.976911
CityDeltaT 0.674521 0.921064 0.976911 1.000000
In [18]:
sns.heatmap(temperaturechangedata.corr(), annot = True)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2bd5656100>

Task 2

Results and Discussion

  • City data had better correlation with corresponding state and country than with global data
  • If there higher activity in a major city with more GHG emissions, its effect is transmitting to related state and country but not so strogly globally. (Another data trend between GHG emission and temperatures does not show any correlation between the two)

Task : Guiding Question 3

How has average land surface temperature changed from 1775 to 2015? Was any dramatic temperature change observed? Can we confirm the global warming from this dataset?

Note:
  • This review was carried out skipping every 5 years to fully visualize the changes
  • Recognized changes between 7 - 24% from 1775 to 2015
  • Dramatic temperature drop seen between 1810 - 1819 (cold decade)
  • Dramatic temperature incrrease noticed in 1980, 1940, 2005, 2010 & 2015. Some of these are on record as the warmest years yet
  • Increasing land temperature seen across the globe
In [19]:
#Cleaning, preparing and wrangling data
GLT['dt'] = pd.to_datetime(GLT['dt'])
yseries = GLT['dt'].dt.year
mseries = GLT['dt'].dt.month

GLT = pd.DataFrame({'Year' : yseries, 'Month' : mseries, 'LandAvgTemp' : GLT['LandAverageTemperature']})

grouped = GLT.groupby(['Year','Month'])
GLT_sum = grouped.sum()
GLT_sum = GLT_sum.loc[1775:2015]

yrlyLandTemp = GLT_sum.mean(axis=1, skipna=True).mean(level=['Year']).reset_index()
yrlyLandTemp = yrlyLandTemp.rename(columns={0:'LandAvgTemp'})

yrlyLandTemp_20 = yrlyLandTemp.loc[::5]
TotalTemp1775 = yrlyLandTemp_20.loc[(yrlyLandTemp_20.Year==1775), 'LandAvgTemp']
yrlyLandTemp_20['% Change since 1775']=yrlyLandTemp_20.LandAvgTemp.apply(lambda x : ((x-TotalTemp1775)/TotalTemp1775)*100)
display(yrlyLandTemp_20.head())
<ipython-input-19-dec6461f9925>:17: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Year LandAvgTemp % Change since 1775
0 1775 9.183083 0.000000
5 1780 9.432917 2.720582
10 1785 7.363000 -19.819959
15 1790 7.982333 -13.075674
20 1795 8.350333 -9.068305
In [20]:
fig = px.bar(yrlyLandTemp_20, y=yrlyLandTemp_20['% Change since 1775'], x='Year', color='% Change since 1775', 
             title = "Change in Average Global Land Temperature: 1775 - 2015")
fig.show()
In [21]:
fig = go.Figure()
fig = go.Figure(data=go.Scatter(x=yrlyLandTemp_20['Year'], y=yrlyLandTemp_20['% Change since 1775'], 
                                mode='lines+markers',marker=dict(color='red'), line=dict(color='black')))
fig.update_layout(title_text="Change in Average Global Land Temperature: 1775 - 2015")
fig.update_xaxes(title_text="Year", type='category')
fig.update_yaxes(title_text="% Change since 1775")
fig.show()

Task : Guiding Question 4

In [22]:
#Cleaning, merging and grouping files
LTbyC['dt'] = pd.to_datetime(LTbyC['dt'])
yseries = LTbyC['dt'].dt.year
mseries = LTbyC['dt'].dt.month

LTbyC = pd.DataFrame({'Year' : yseries, 'Month' : mseries, 'AverageTemperature' : LTbyC['AverageTemperature'], 
                       'Country' : LTbyC['Country']})
LTbyC_merged = pd.merge(LTbyC, ctry, on='Country')
LTbyC_merged = LTbyC_merged.groupby(['Year','Code','Country','Month']).sum()
grouped = LTbyC_merged.groupby(['Year','Code','Country']).mean()
grouped.iloc[1752:2012:10]

yearlyLandTemp = grouped.mean(axis=1, skipna=True).mean(level=['Year','Code','Country']).reset_index()
yearlyLandTemp = yearlyLandTemp.rename(columns={0:'AvgTemp'})

#bins = pd.IntervalIndex.from_tuples([(-22,-12),(-12,-2),(-2,2),(2,12), (12,22), (22,32)], closed ='left')
bins = pd.IntervalIndex.from_tuples([(-22,-17),(-17,-12),(-12,-7),(-7,-2), (-2,3), (3,8),(8,13),(13,18),(18,23),(23,28),(28,33)], closed ='left')
yearlyLandTemp['AvgTemp_bin'] = pd.cut(yearlyLandTemp.AvgTemp, bins, ordered =True)

yrlyLandTemp_sort = yearlyLandTemp.sort_values(by=['AvgTemp_bin','Year'], ascending = False)
display(yrlyLandTemp_sort.head())

display(yrlyLandTemp_sort.describe())
Year Code Country AvgTemp AvgTemp_bin
36287 2012 ABW Aruba 28.652333 [28, 33)
36293 2012 ARE United Arab Emirates 29.425833 [28, 33)
36304 2012 BFA Burkina Faso 28.630000 [28, 33)
36331 2012 DJI Djibouti 29.923583 [28, 33)
36354 2012 GMB Gambia 28.301667 [28, 33)
Year AvgTemp
count 36653.000000 36653.000000
mean 1908.205031 16.250927
std 66.305731 9.919839
min 1743.000000 -20.834167
25% 1862.000000 8.844250
50% 1913.000000 18.850917
75% 1963.000000 25.510333
max 2013.000000 30.127083
In [23]:
fig = px.choropleth(yrlyLandTemp_sort, locations = 'Code',color="AvgTemp_bin", hover_data =['AvgTemp'], hover_name="Country", 
                    title ='Average Land Temperature by Country: 1752 - 2012', projection = 'robinson', height= 800,
                    animation_frame = 'Year', color_discrete_sequence= px.colors.sequential.thermal)
fig.update_geos(resolution=50)
fig.show()

Countries highlight:

Between 1852 & 2012:
  • Canada increased by ~38% going from -5.2 in 1775 to -3.2 in 2012
  • US increase by ~25%
  • China increased by ~21%
  • All countries in this group increased from 5.4 - 38%
In [55]:
#Average Land Temperature in Canada: 1833 - 2012
LTbyC = pd.read_csv('GlobalLandTemperaturesByCountry.csv')
display(LTbyC.head())

LTbyC['dt'] = pd.to_datetime(LTbyC['dt'])
yseries = LTbyC['dt'].dt.year
mseries = LTbyC['dt'].dt.month

LTbyC = pd.DataFrame({'Year' : yseries, 'Month' : mseries, 'AverageTemperature' : LTbyC['AverageTemperature'], 
                       'Country' : LTbyC['Country']})

group = LTbyC.loc[LTbyC.Country == 'Canada'].groupby(['Year','Month'])
LTbyCA_sum = group.sum().loc[1833:2012]

CALandTemp = LTbyCA_sum.mean(axis=1, skipna=True).mean(level=['Year']).reset_index()
CALandTemp = CALandTemp.rename(columns={0:'LandAvgTemp'})
CALandTemp_20 = CALandTemp.loc[::10]

TotalTemp1833 = CALandTemp_20.loc[(CALandTemp_20.Year==1833), 'LandAvgTemp']
CALandTemp_20['% change since 1833']=CALandTemp_20.LandAvgTemp.apply(lambda x : ((x-TotalTemp1833)/TotalTemp1833)*100)
display(CALandTemp_20.head())
dt AverageTemperature AverageTemperatureUncertainty Country
0 1743-11-01 4.384 2.294 Ã…land
1 1743-12-01 NaN NaN Ã…land
2 1744-01-01 NaN NaN Ã…land
3 1744-02-01 NaN NaN Ã…land
4 1744-03-01 NaN NaN Ã…land
<ipython-input-55-0d0093ef0179>:20: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Year LandAvgTemp % change since 1833
0 1833 -5.248333 -0.000000
10 1843 -6.107750 16.375040
20 1853 -5.590917 6.527469
30 1863 -5.663167 7.904097
40 1873 -5.609250 6.876786
In [25]:
fig = go.Figure()
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
    go.Bar(x=CALandTemp_20['Year'], y=CALandTemp_20['% change since 1833'], opacity=0.5,
           marker_line_color='rgb(8,200,30)', marker_line_width=2, name="Change since 1833 (%)", 
           yaxis="y2"))
fig.add_trace(
    go.Scatter(x=CALandTemp_20['Year'], y=CALandTemp_20['LandAvgTemp'],name="Average Land Temperature (degrees Celsius)"))
fig.update_layout(title_text="Average Land Temperature in Canada: 1833 - 2015")
fig.update_xaxes(title_text="Year", type='category')
fig.update_yaxes(title_text="Average Land Temperature (degrees Celsius)", secondary_y=False)
fig.update_yaxes(title_text="Change since 1833 (%)", secondary_y=True, autorange="reversed")
In [26]:
fig.show()
In [28]:
LTbyC = pd.read_csv('GlobalLandTemperaturesByCountry.csv')
display(LTbyC.head())

LTbyC['dt'] = pd.to_datetime(LTbyC['dt'])
yseries = LTbyC['dt'].dt.year
mseries = LTbyC['dt'].dt.month

LTbyC = pd.DataFrame({'Year' : yseries, 'Month' : mseries, 'AverageTemperature' : LTbyC['AverageTemperature'], 
                       'Country' : LTbyC['Country']})

group_d = LTbyC.groupby(['Year','Country','Month'])
LTbyC_sum = group_d.sum().loc[1852:2012]

GLandTemp = LTbyC_sum.mean(axis=1, skipna=True).mean(level=['Year', 'Country']).reset_index()
GLandTemp = GLandTemp.rename(columns={0:'LandAvgTemp'})
AvgTemp1852 = GLandTemp.loc[(GLandTemp.Year == 1852)]
AvgTemp1852.set_index(['Country'], inplace=True)
#display(GLandTemp)

CAD = AvgTemp1852.loc['Canada', 'LandAvgTemp']
CHN = AvgTemp1852.loc['China', 'LandAvgTemp']
DNK = AvgTemp1852.loc['Denmark', 'LandAvgTemp']
GER = AvgTemp1852.loc['Germany', 'LandAvgTemp']
IND = AvgTemp1852.loc['India', 'LandAvgTemp']
BRZ = AvgTemp1852.loc['Brazil', 'LandAvgTemp']
IRN = AvgTemp1852.loc['Iran', 'LandAvgTemp']
JPN = AvgTemp1852.loc['Japan', 'LandAvgTemp']
LUX = AvgTemp1852.loc['Luxembourg', 'LandAvgTemp']
MEX = AvgTemp1852.loc['Mexico', 'LandAvgTemp']
RUS = AvgTemp1852.loc['Russia', 'LandAvgTemp']
SAU = AvgTemp1852.loc['Saudi Arabia', 'LandAvgTemp']
ROK = AvgTemp1852.loc['South Korea', 'LandAvgTemp']
CHE = AvgTemp1852.loc['Switzerland', 'LandAvgTemp']
USA = AvgTemp1852.loc['United States', 'LandAvgTemp']
#print(AvgTemp1852)

#Create same bar graph above to check the difference between Country.
def assign1852ByCountry(Country):
    if Country =='Canada':
        return CAD
    elif Country =='China':
        return CHN
    elif Country =='Denmark':
        return DNK
    elif Country =='Germany':
        return GER
    elif Country =='India':
        return IND
    elif Country =='Brazil':
        return BRZ
    elif Country =='Iran':
        return IRN
    elif Country =='Japan':
        return JPN
    elif Country =='Luxembourg':
        return LUX
    elif Country =='Mexico':
        return MEX
    elif Country =='Russia':
        return RUS
    elif Country =='Saudi Arabia':
        return SAU
    elif Country =='South Korea':
        return ROK
    elif Country =='Switzerland':
        return CHE
    elif Country =='United States':
        return USA 

GLandTemp['LandAvgTemp1852'] = GLandTemp.Country.apply(assign1852ByCountry)
GLandTemp = GLandTemp.dropna()

function = lambda x, y: (x-y)/y*100
GLandTemp=GLandTemp.assign(percent_change_since_1852= lambda x: ((x['LandAvgTemp']-x['LandAvgTemp1852'])/x['LandAvgTemp1852']*100))
display(GLandTemp) 
dt AverageTemperature AverageTemperatureUncertainty Country
0 1743-11-01 4.384 2.294 Ã…land
1 1743-12-01 NaN NaN Ã…land
2 1744-01-01 NaN NaN Ã…land
3 1744-02-01 NaN NaN Ã…land
4 1744-03-01 NaN NaN Ã…land
Year Country LandAvgTemp LandAvgTemp1852 percent_change_since_1852
25 1852 Brazil 24.359917 24.359917 0.000000
32 1852 Canada -5.358000 -5.358000 -0.000000
35 1852 China 5.857500 5.857500 0.000000
47 1852 Denmark -18.322667 -18.322667 -0.000000
64 1852 Germany 8.750583 8.750583 0.000000
... ... ... ... ... ...
38156 2012 Russia -3.901750 -6.039667 -35.397925
38167 2012 Saudi Arabia 26.988917 25.101167 7.520567
38181 2012 South Korea 12.391667 10.772000 15.035896
38189 2012 Switzerland 8.159250 7.232000 12.821488
38208 2012 United States 10.261083 8.183083 25.393851

2415 rows × 5 columns

In [29]:
bar = go.Bar()
fig_b = go.FigureWidget(data=bar)
fig_b.update_yaxes(range=[-70,50])

# A list passed to interact() will yield a drop-down interactor
@interact(Country = ['Brazil','Canada', 'China', 'Denmark', 'Germany','India','Iran','Japan','Luxembourg','Mexico',
                     'Russia','Saudi Arabia','South Korea','Switzerland','United States'])
def update_bar(Country = 'Germany'):
    data = GLandTemp.loc[GLandTemp.Country==Country]
    #display(data)
    data.set_index('Year', inplace=True)
    fig_b.update_traces(x=pd.Series(data.index.values), y=data.percent_change_since_1852, opacity=0.4, 
                          marker_line_color='rgb(8,50,107)', marker_line_width=2, name="Change since 2005 (%)")
    fig_b.update_layout(title_text="Average Land Temperature by Country: {0}".format(Country))
    fig_b.update_xaxes(title_text="Year", type='category')
    fig_b.update_yaxes(title_text="change_since_1852 (%)")
fig_b

Tasks : Guiding Question 5 adn 6

Data Wrangling

To address the guiding question 4 and 5, here is a list of what kind of data wrangling was conducted.

  1. How Greenhouse Gas (GHG) emission measured at each facilities station decrease or increased in Canada over years?
    • Any difference between province. If so, what is the main factor of its difference? (List of Data Wrangling)
      • Compute Total GHG Emission (kton. CO2 equivalent) by Year, and by State and Year
      • Compute Change of GHG Emission since 2005 (%) by Year, and by State and Year
      • Create Facility Group columns based on FacilityCode (string) from origirnal data to investigate the impact of Facility business sectors
  2. Any relationship between GHG emission and average surface temperatures in Canada? (List of Data Wrangling)
    • Compute Average Land Surface Temperature by Year, and by Season, Year and State
    • Merge two dataframe to make a visualization of average land temperature vs. Total GHG Emission

Greenhouse Gas (GHG) Dataset

  • Loading csv. , Renaming etc.
In [56]:
# Read "GHG emissino data in Canada" csv file
GHG = pd.read_csv("./PDGES-GHGRP-GHGEmissionsGES-2004-Present.csv",  encoding = "ISO-8859-1", engine='python')

# Change columns name:
GHG = GHG[["Reference Year / Année de référence" , "Facility Name / Nom de l'installation", "Facility City or District or Municipality / Ville ou District ou Municipalité de l'installation", "Facility Province or Territory / Province ou territoire de l'installation", "Latitude", "Longitude", "Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)", "English Facility NAICS Code Description / Description du code SCIAN de l'installation en anglais"]]    
GHG = GHG.rename({"Reference Year / Année de référence": "YEAR", "Facility Name / Nom de l'installation":"FacilityName", "Facility Province or Territory / Province ou territoire de l'installation":"State", "Total Emissions (tonnes CO2e) / Émissions totales (tonnes éq. CO2)":"TotalEmission", "Facility City or District or Municipality / Ville ou District ou Municipalité de l'installation":"FacilityCity", "English Facility NAICS Code Description / Description du code SCIAN de l'installation en anglais":"FacilityCode"}, axis='columns')    
for col in GHG.columns:
    print(col + ": " + str(GHG[col].dtype))

display(GHG.head())
display(GHG.tail())
display(GHG.describe())
YEAR: int64
FacilityName: object
FacilityCity: object
State: object
Latitude: float64
Longitude: float64
TotalEmission: float64
FacilityCode: object
YEAR FacilityName FacilityCity State Latitude Longitude TotalEmission FacilityCode
0 2018 Division Alma Alma Quebec 48.56500 -71.65556 9.522334e+04 Mechanical pulp mills
1 2018 Foothills Pipeline, Alberta Airdrie Alberta NaN NaN 3.823373e+05 Pipeline transportation of natural gas
2 2018 Kingston CoGen Bath Ontario 44.20950 -76.72460 1.195507e+03 Fossil-fuel electric power generation
3 2018 Redwater Fertilizer Operations Sturgeon County Alberta 53.84200 -113.09300 1.288410e+06 Chemical fertilizer (except potash) manufacturing
4 2018 Alberta Envirofuels Edmonton Alberta 53.53199 -113.36492 2.976423e+05 Other basic organic chemical manufacturing
YEAR FacilityName FacilityCity State Latitude Longitude TotalEmission FacilityCode
9600 2004 Windsor Essex Cogeneration Plant Windsor Ontario 42.2877 -82.9803 1.787137e+05 Fossil-Fuel Electric Power Generation
9601 2004 Wolf Lake/Primrose Thermal Operation Bonnyville Alberta NaN NaN 1.455941e+06 Non-Conventional Oil Extraction
9602 2004 Works 84, Owen Sound Flat Glass Plant Owen Sound Ontario NaN NaN 1.037091e+05 Glass Manufacturing
9603 2004 Zama Gas Plant: 1, 2, 3 Zama City Alberta NaN NaN 1.381902e+05 Conventional Oil and Gas Extraction
9604 2004 Usine Vaudreuil Jonquière Quebec NaN NaN 7.038797e+05 Primary Production of Alumina and Aluminum
YEAR Latitude Longitude TotalEmission
count 9605.000000 6204.000000 6204.000000 9.605000e+03
mean 2013.143883 49.900769 -76.603092 4.218623e+05
std 4.326713 6.375645 65.179512 1.208234e+06
min 2004.000000 -71.030000 -144.381000 6.000000e-01
25% 2010.000000 45.613300 -114.557000 3.080210e+04
50% 2014.000000 50.449350 -102.100105 9.546596e+04
75% 2017.000000 53.733200 -73.898675 3.051536e+05
max 2018.000000 73.301800 135.037900 1.788789e+07

General: Facility Annual GHG Emission distribution by Year

Here is the recorded Annual GHG data at each Facility. You can see a lot of variation of GHG Emission level.

In [57]:
fig = px.violin(GHG, y="TotalEmission", x="YEAR", color="YEAR", box=True,
          hover_data=GHG.columns)
fig.show()

TotalEmission data are available at each facility who meet the requirement that emit 10 kilotonnes or more of GHGs, in carbon dioxide (CO2) equivalent (eq.) units, per year. Since Government of Canada changed the reporting threshold which was lowered from 50 kt to 10 kt in 2017. For apple to apple comparison over years, facilities with emissions below 50kt were excluded from this trend analysis.

In [58]:
GHGrev = GHG.loc[(GHG.TotalEmission>=50000)] # Exclude the facilities rows < 50 kt.
GHGrev = GHGrev.loc[(GHG.YEAR >= 2005)]
#display(GHGrev.head())
#display(GHGrev.describe())

fig = px.histogram(GHGrev, x="YEAR", color="State")
fig.update_layout(title='Fig.1 Facilitiy Counts by province :Total Emission Gas >= 50 kt')
fig.update_xaxes(type='category', autorange="reversed")
fig.show()
  • Compute the Total GHG Emission, change % since 2005 by Year and by Year and State
In [59]:
# Compute Total GHG Emission and Change % since 2005 by Year
AnnualGHG = GHGrev.groupby(['YEAR']).sum().reset_index()
TotalGHG2005 = AnnualGHG.loc[(AnnualGHG.YEAR==2005), 'TotalEmission']
AnnualGHG['% of change since 2005']=AnnualGHG.TotalEmission.apply(lambda x : (x-TotalGHG2005)/TotalGHG2005*100)
display(AnnualGHG)
YEAR Latitude Longitude TotalEmission % of change since 2005
0 2005 1203.60070 -1679.66280 2.777613e+08 0.000000
1 2006 1207.58850 -1749.61410 2.715130e+08 -2.249515
2 2007 1153.19410 -1570.53620 2.769726e+08 -0.283949
3 2008 1399.45960 -1270.94210 2.719653e+08 -2.086690
4 2009 7463.95630 -8315.05550 2.522382e+08 -9.188850
5 2010 10264.35370 -11720.90340 2.624066e+08 -5.528040
6 2011 20239.18860 -14051.26520 2.548051e+08 -8.264723
7 2012 22233.43220 -21579.89230 2.584643e+08 -6.947325
8 2013 23425.50640 -27504.58560 2.594568e+08 -6.590024
9 2014 23795.00590 -31846.62680 2.622104e+08 -5.598677
10 2015 23017.68450 -32033.62050 2.625486e+08 -5.476888
11 2016 22752.07270 -42835.56360 2.619978e+08 -5.675195
12 2017 13110.80802 -25145.78848 2.698209e+08 -2.858706
13 2018 26492.70160 -50400.60079 2.713390e+08 -2.312163
In [60]:
# Compute Total GHG Emission and Change % since 2005 by Year and State
AnnualGHGState = GHGrev.groupby(['State', 'YEAR']).sum().reset_index()
TotalGHG2005 = AnnualGHGState.loc[(AnnualGHGState.YEAR==2005)] 
TotalGHG2005.set_index('State', inplace=True)
AB = TotalGHG2005.loc['Alberta', 'TotalEmission']
BC = TotalGHG2005.loc['British Columbia', 'TotalEmission']
NL = TotalGHG2005.loc['Newfoundland and Labrador', 'TotalEmission']
PE = TotalGHG2005.loc['Prince Edward Island', 'TotalEmission']
NS = TotalGHG2005.loc['Nova Scotia', 'TotalEmission']
NB = TotalGHG2005.loc['New Brunswick', 'TotalEmission']
QC = TotalGHG2005.loc['Quebec', 'TotalEmission']
ON = TotalGHG2005.loc['Ontario', 'TotalEmission']
MB = TotalGHG2005.loc['Manitoba', 'TotalEmission']
SK = TotalGHG2005.loc['Saskatchewan', 'TotalEmission']
NT = TotalGHG2005.loc['Northwest Territories', 'TotalEmission']
#NU = TotalGHG2005.loc['Nunavut', 'TotalEmission'] # No recored of 2005 GHG Emission of State Nunavun 

#print(TotalGHG2005)

# Create same bar and line graph above to check the difference between States.
def Assign2005ByState(State):
    if State =='Alberta':
        return AB
    elif State =='British Columbia':
        return BC
    elif State =='Newfoundland and Labrador':
        return NL
    elif State =='Prince Edward Island':
        return PE
    elif State =='Nova Scotia':
        return NS
    elif State =='New Brunswick':
        return NB
    elif State =='Quebec':
        return QC
    elif State =='Ontario':
        return ON
    elif State =='Manitoba':
        return MB
    elif State =='Saskatchewan':
        return SK
    elif State =='Northwest Territories':
        return NT

AnnualGHGState['TotalEmission2005'] = AnnualGHGState.State.apply(Assign2005ByState)
#display(AnnualGHGState)

function = lambda x, y: (x-y)/y*100

#AnnualGHGState['% of change since 2005']=AnnualGHGState.apply(lambda x: function(AnnualGHGState.TotalEmission, AnnualGHGState.TotalEmission2005))
AnnualGHGState=AnnualGHGState.assign(Percent_ChangeSince2005= lambda x: ((x['TotalEmission']-x['TotalEmission2005'])/x['TotalEmission2005']*100))

#display(AnnualGHGState) 
In [61]:
AnnualGHGState.set_index('State', inplace=True)
display(AnnualGHGState.head())
YEAR Latitude Longitude TotalEmission TotalEmission2005 Percent_ChangeSince2005
State
Alberta 2005 53.5841 -113.614 1.068345e+08 1.068345e+08 0.000000
Alberta 2006 53.5592 -113.349 1.137606e+08 1.068345e+08 6.482998
Alberta 2007 0.0000 0.000 1.123281e+08 1.068345e+08 5.142143
Alberta 2008 53.0000 117.000 1.194828e+08 1.068345e+08 11.839157
Alberta 2009 948.4327 -914.340 1.178865e+08 1.068345e+08 10.345038
  • Create Facility Group columns based on FacilityCode (string) from origirnal data to investigate the impact of Facility business sectors.
In [62]:
GHGindustry = GHGrev.groupby(['State', 'FacilityCode', 'YEAR']).sum().reset_index()
GHGindustry['FacilityCode'] = GHGindustry['FacilityCode'].str.lower()
GHGindustry['FacilityGroup'] = pd.np.where(GHGindustry['FacilityCode'].str.contains("oil"),"Oil and Gas",
                   pd.np.where(GHGindustry['FacilityCode'].str.contains("gas"),"Oil and Gas",
                   pd.np.where(GHGindustry['FacilityCode'].str.contains("manufacturing"),"Manufacturing",
                   pd.np.where(GHGindustry['FacilityCode'].str.contains("mining"),"Mining", "Others"))))
display(GHGindustry)

GHGindistryGroup = GHGindustry.groupby(['FacilityGroup', 'YEAR']).sum().reset_index()
display(GHGindistryGroup.head())
<ipython-input-62-147b2f0dcd77>:3: FutureWarning:

The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

<ipython-input-62-147b2f0dcd77>:4: FutureWarning:

The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

<ipython-input-62-147b2f0dcd77>:5: FutureWarning:

The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

<ipython-input-62-147b2f0dcd77>:6: FutureWarning:

The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

State FacilityCode YEAR Latitude Longitude TotalEmission FacilityGroup
0 Alberta all other basic inorganic chemical manufacturing 2005 0.00000 0.00000 133904.58000 Manufacturing
1 Alberta all other basic inorganic chemical manufacturing 2006 0.00000 0.00000 127578.56090 Manufacturing
2 Alberta all other basic inorganic chemical manufacturing 2007 0.00000 0.00000 125955.82590 Manufacturing
3 Alberta all other basic inorganic chemical manufacturing 2008 0.00000 0.00000 120682.72290 Manufacturing
4 Alberta all other basic inorganic chemical manufacturing 2009 0.00000 0.00000 96052.55404 Manufacturing
... ... ... ... ... ... ... ...
2120 Saskatchewan waste treatment and disposal 2014 52.09920 -106.70700 90593.29210 Others
2121 Saskatchewan waste treatment and disposal 2015 52.09920 -106.70700 91346.48604 Others
2122 Saskatchewan waste treatment and disposal 2016 52.09920 -106.70700 87303.85228 Others
2123 Saskatchewan waste treatment and disposal 2017 0.00000 0.00000 91502.27184 Others
2124 Saskatchewan waste treatment and disposal 2018 104.36313 -213.35871 168203.98654 Others

2125 rows × 7 columns

FacilityGroup YEAR Latitude Longitude TotalEmission
0 Manufacturing 2005 130.2187 -83.6077 5.425900e+07
1 Manufacturing 2006 173.9227 -163.3353 5.191711e+07
2 Manufacturing 2007 130.2187 -83.6077 5.030547e+07
3 Manufacturing 2008 274.3461 -24.8696 4.898958e+07
4 Manufacturing 2009 2308.9465 -3007.8294 3.913417e+07

Surface Land Temperature Dataset

  • Load csv.
In [63]:
# Read "global temperature by State" csv file
GlobalTempState = pd.read_csv("./GlobalLandTemperaturesByState.csv")
display(GlobalTempState.head())
display(GlobalTempState.tail())
display(GlobalTempState.describe())
GlobalTempState['dt'] = pd.to_datetime(GlobalTempState['dt']) 

for col in GlobalTempState.columns:
    print(col + ": " + str(GlobalTempState[col].dtype))
dt AverageTemperature AverageTemperatureUncertainty State Country
0 1855-05-01 25.544 1.171 Acre Brazil
1 1855-06-01 24.228 1.103 Acre Brazil
2 1855-07-01 24.371 1.044 Acre Brazil
3 1855-08-01 25.427 1.073 Acre Brazil
4 1855-09-01 25.675 1.014 Acre Brazil
dt AverageTemperature AverageTemperatureUncertainty State Country
645670 2013-05-01 21.634 0.578 Zhejiang China
645671 2013-06-01 24.679 0.596 Zhejiang China
645672 2013-07-01 29.272 1.340 Zhejiang China
645673 2013-08-01 29.202 0.869 Zhejiang China
645674 2013-09-01 NaN NaN Zhejiang China
AverageTemperature AverageTemperatureUncertainty
count 620027.000000 620027.000000
mean 8.993111 1.287647
std 13.772150 1.360392
min -45.389000 0.036000
25% -0.693000 0.316000
50% 11.199000 0.656000
75% 19.899000 1.850000
max 36.339000 12.646000
dt: datetime64[ns]
AverageTemperature: float64
AverageTemperatureUncertainty: float64
State: object
Country: object
  • Compute Annual Average Temperature in Canada
  • Compute Seasonal Average Temperature in Canada
In [64]:
# Add Year, Month columns from date.time frame of 'dt'. 
GlobalTempState['YEAR']=GlobalTempState['dt'].dt.year
GlobalTempState['MONTH']=GlobalTempState['dt'].dt.month

GlobalTempState=GlobalTempState.loc[(GlobalTempState.Country=="Canada")] # Only the temperature data in Canada will be used for this task.
GlobalTempState = GlobalTempState.drop(['dt', 'AverageTemperatureUncertainty'], axis=1) # Remove redundant columns

def season(Month):
    if Month >=3 and Month <= 5:
        return "spring"
    elif Month >=6 and Month <=8:
        return "summer"
    elif Month >=9 and Month <=11:
        return "autumn"
    elif Month ==12 or Month ==1 or Month ==2:
        return "winter"
GlobalTempState['Season']=GlobalTempState['MONTH'].apply(season)
display(GlobalTempState)
AverageTemperature State Country YEAR MONTH Season
15107 8.772 Alberta Canada 1768 9 autumn
15108 0.813 Alberta Canada 1768 10 autumn
15109 -8.319 Alberta Canada 1768 11 autumn
15110 -15.579 Alberta Canada 1768 12 winter
15111 -19.287 Alberta Canada 1769 1 winter
... ... ... ... ... ... ...
641226 2.858 Yukon Canada 2013 5 spring
641227 12.044 Yukon Canada 2013 6 summer
641228 12.458 Yukon Canada 2013 7 summer
641229 11.347 Yukon Canada 2013 8 summer
641230 5.267 Yukon Canada 2013 9 autumn

35358 rows × 6 columns

In [65]:
# Create table by Year, Month and State.
grouped = GlobalTempState.groupby(['YEAR', 'Season', 'MONTH', 'State'])
TempCanada = grouped.sum()
TempCanada=TempCanada.loc[2004:2012]

display(TempCanada.head())
display(TempCanada.tail())
AverageTemperature
YEAR Season MONTH State
2004 autumn 9 Alberta 7.656
British Columbia 6.414
Manitoba 10.149
New Brunswick 12.915
Newfoundland And Labrador 7.758
AverageTemperature
YEAR Season MONTH State
2012 winter 12 Nunavut -25.253
Ontario -12.287
Prince Edward Island -1.348
Saskatchewan -19.063
Yukon -28.727
In [66]:
# Compute annual averarge temperature and seasonal temperature for each province.
annualTemp = TempCanada.mean(axis=1, skipna=True).mean(level=['YEAR']).reset_index()
annualTemp = annualTemp.rename(columns={0:'AvgTemp'})
display(annualTemp)
seasonTempState = TempCanada.mean(axis=1, skipna=True).mean(level=['YEAR', 'Season', 'State']).reset_index()
seasonTempState=seasonTempState.rename(columns={0:'AvgTemp'})
display(seasonTempState.head())
YEAR AvgTemp
0 2004 -1.305819
1 2005 -0.010042
2 2006 0.684646
3 2007 -0.797597
4 2008 -0.950222
5 2009 -0.917681
6 2010 1.060528
7 2011 -0.234389
8 2012 0.172681
YEAR Season State AvgTemp
0 2004 autumn Alberta 1.402333
1 2004 autumn British Columbia 1.223333
2 2004 autumn Manitoba 1.324333
3 2004 autumn New Brunswick 6.939333
4 2004 autumn Newfoundland And Labrador 2.461333

Visualization : Guiding Question 5

5. How Greenhouse Gas (GHG) emission measured at each facilities station decrease or increased in Canada over years?

  • From Fig.1 (Reporting Facility Counts by Year in the above Data Wrangling section), the number of reporting facilities has increased from 330 to 540 over the year 2005 to 2018. While, overall reported total GHG emission was decreased by 2.3% from 2005 (Fig.2).
  • The significant decrease of GHG from 2005 to 2009 up to 9% was seen in Fig.2, but the period between 2010 to 2018, the degree of Total GHG Emission has continousuly growed up.
In [67]:
fig = go.Figure()
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
    go.Bar(x=AnnualGHG['YEAR'], y=AnnualGHG['% of change since 2005'], opacity=0.4,
                     marker_line_color='rgb(8,50,107)',
                     marker_line_width=2, name="Change since 2005 (%)", yaxis="y2"
    ))
fig.add_trace(
    go.Scatter(x=AnnualGHG['YEAR'], y=AnnualGHG['TotalEmission'], name="Total GHG Emission (kton)"
    ))
fig.update_layout(
    title_text="Fig.2 Total GHG Emission in Canada over years"
)
fig.update_xaxes(title_text="Year", type='category')
fig.update_yaxes(title_text="Total GHG Emission (kton)", secondary_y=False)
fig.update_yaxes(title_text="Change since 2005 (%)", secondary_y=True, autorange="reversed")
fig.show()

5-1. Any Total GHG trend difference over years between provinces in Canada?

  • There is a quite large variation of total GHG Emission over years between provinces.
  • The facilities located in West Canada Provinces (Alberta, Northwest Teritory, Blitish Columbia, Saskatuan) tend to have a trend that total GHG has increased over years. Especially, it is outstanding increase of GHG emission in Alberta.
  • On the other hand, the facilities located in East Canada Provinces (Ontario, Quebec, Nova Scotia, Newfoundland and Labrador, Prince Edward Island) suceeded to reduce the GHG emission continously since 2005 to 2018.
In [68]:
bar = go.Bar()
fig_bar = go.FigureWidget(data=bar)
fig_bar.update_yaxes(range=[-50, 90])

scatter = go.Scatter()
fig_scatter = go.FigureWidget(data=scatter)

# A list passed to interact() will yield a drop-down interactor
@interact(State = ['Alberta', 'British Columbia', 'Newfoundland and Labrador', 'Prince Edward Island', 'Nova Scotia', 'New Brunswick', 'Quebec', 'Ontario', 'Manitoba', 'Saskatchewan', 'Northwest Territories'])

def update_bar(State = 'Northwest Territories'):
    data = AnnualGHGState.loc[State]
    #print(data)
    data.set_index('YEAR', inplace=True)
    
    fig_bar.update_traces(x=pd.Series(data.index.values),
                          y=data.Percent_ChangeSince2005, opacity=0.4, marker_line_color='rgb(8,50,107)', marker_line_width=2, name="Change since 2005 (%)")
    fig_bar.update_layout(title_text="Fig. 3.1 % Change since 2005 over Years: {0}".format(State))
    fig_bar.update_xaxes(title_text="Year", type='category')
    fig_bar.update_yaxes(title_text="Change since 2005 (%)")
    
    fig_scatter.update_traces(x=pd.Series(data.index.values),
                              y=data.TotalEmission, line=dict(color='red', width=2), name="Total GHG Emission (kton)")
    fig_scatter.update_layout(title_text="Fig. 3.2 Total GHG Emission (kton) over Years : {0}".format(State))
    fig_scatter.update_xaxes(title_text="Year", type='category')
    fig_scatter.update_yaxes(title_text="Total GHG Emission (kton)")

fig_bar
In [69]:
fig_scatter

Figure 4 is the spatial mapping for the 2018 GHG Emission Reported to Environment and Climate Chagne Canada, which clearly shows that the highest GHG producers has concentrated on the Alberta.

In [70]:
GHG2018 = GHGrev.loc[(GHG.YEAR==2018)]
GHG2018.dropna(subset=['Latitude', 'Longitude'], inplace=True) 
GHG2018 = GHG2018.drop(['YEAR'], axis=1)

#hist = px.histogram(GHG2018, x="TotalEmission")
#hist.show()

def colorbin(GHG):
    if GHG >=0 and GHG < 100000:
        return "Below100k"
    elif GHG >=100000 and GHG < 500000:
        return "100k to < 500k"
    elif GHG >=500000 and GHG < 1000000:
        return "500k to < 1000k"
    elif GHG >= 1000000 and GHG < 2000000:
        return "1000K to < 2000k"
    elif GHG >=2000000:
        return "Over 2000k"
    
GHG2018['2018TotalEmission_kton']=GHG2018['TotalEmission'].apply(colorbin)

# display(GHG2018)
<ipython-input-70-ec0def1081e5>:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [72]:
fig = px.scatter_geo(data_frame=GHG2018, lat="Latitude", lon = "Longitude",
                    color="2018TotalEmission_kton", 
                    color_discrete_map={'Below100k':'blue',
                                        '100k to < 500k':'green',
                                        '500k to < 1000k':'yellow',
                                        '1000K to < 2000k':'orange',
                                        'Over 2000k':'red',
                                        },
                    category_orders={
                                    "2018TotalEmission_kton": [
                                            "Over 2000k",
                                            "1000K to < 2000k",
                                            "500k to < 1000k",
                                            "100k to < 500k",
                                            "Below100k"
                                            ]
                                    },
                     opacity = 0.8,
                     title ='Fig.4 2018 Facility Greenhouse Gas Emission Reported to Environment and Climate Change Canada',
                     scope = 'north america',
                     width = 1100,
                     height = 700,
                     size = 'TotalEmission'
                   )
fig.update_geos(
        visible = False,
        scope = 'north america',
        showland = True,
        landcolor = "rgb(212, 212, 212)",
        subunitcolor = "rgb(255, 255, 255)",
        countrycolor = "rgb(255, 255, 255)",
        showlakes = True,
        lakecolor = "rgb(255, 255, 255)",
        showsubunits = True,
        showcountries = True,
        resolution = 50,
        projection = dict(
            type = 'conic conformal',
            rotation_lon = -100
        ),
        lonaxis = dict(
            showgrid = True,
            gridwidth = 0.5,
            range= [ -140.0, -55.0 ],
            dtick = 5
        ),
        lataxis = dict (
            showgrid = True,
            gridwidth = 0.5,
            range= [ 20.0, 60.0 ],
            dtick = 5
        ),
    )
fig.show()

5-2. Why is there Total GHG Emission difference between Provinces? Does the Business Sector have impaacted to the GHG Emission Level?

  • Overall, GHG emission reported by Oil and Gas, Mining Business Sectors have experienced a steady increase over years while the GHG emission reported by Manufacturing and others has reduced.
In [73]:
fig = px.line(GHGindistryGroup, x='YEAR', y='TotalEmission', color='FacilityGroup')
fig.update_traces(mode='lines+markers')
fig.update_layout(title='Fig.5 Total GHG Emission by Main Facility Sectors: Oil and Gas, Mining, Manufacturing, Others')
fig.show()
  • The sunburst plot (Figure 6) provides the portion of Main Business Secotrs by each provinces.
  • We can confirm more than 50% of Reporting Facilities in Alberta are related to the oil and gas Sectors, which explains why continuous total GHG emission over years can be seen.
  • Ontario who hold highest populations in Canada, overall the total emission in 2018 is second high following Alberta but Ontario suceeds to decrease Total GHG Emission over years that main business sector is Manufacturing.
  • One concludes the high portion of Oil and Gas Business Secotors definitely contribute to release more GHG, which I use this trend to predict "How likely we will achive the target of 30% GHG reduction by 2030 based on Pari's aggreement" with multi-linear regression model for DATA602 Group Project.
In [74]:
GHG2018 = GHGindustry.loc[(GHG.YEAR==2018)]
GHG2018.dropna(subset=['FacilityCode', 'TotalEmission', 'FacilityGroup'], inplace=True) 
#display(GHG2018)

fig = px.sunburst(GHG2018, path=['State', 'FacilityGroup', 'FacilityCode'],
                values = 'TotalEmission',
                  color='TotalEmission', 
                  color_continuous_scale='balance')
fig.update_layout(title='Fig. 6 2018 Total GHG Emission (kton) by State and Main Facility Sectors')

fig.show()
#GHG2018.FacilityCode.unique
<ipython-input-74-2284240eb238>:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [75]:
AnnualGHGState.reset_index(inplace=True)
fig = px.line(AnnualGHGState, x='YEAR', y='TotalEmission', color='State')
fig.update_traces(mode='lines+markers')
fig.update_layout(title='Fig. 7 Total GHG Emission in Canada over years by province')
fig.show()

Visualization : Guiding Question 6

6. Any relationship between GHG emission and average surface temperatures in Canada?

  • There is no recognaizable trend between GHG Emission and Average Surface Temperatures in Canada (Fig. 8)
  • The scatter plot of summer temperature vs. Total emission by State also doesn't show any clear association between two.
  • Simply, most people believe high GHG Emission due to human activity would cause the global warnming. But in our dataset, the GHG reduction effect to temperature is uncertain. Since the dataset is only collected in Canada and relatively short term only covered 2005 to 2018, it might be hard to see clear influence between both.
In [76]:
mergeTable = pd.merge(AnnualGHG, annualTemp, on=['YEAR'])
#display(mergeTable)

fig = go.Figure()
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
    go.Scatter(x=mergeTable['YEAR'], y=mergeTable['AvgTemp'], opacity=0.4,
                     marker_line_color='rgb(8,50,107)',
                     marker_line_width=2, name="Average Temperature (degC)", yaxis="y2"
    ))
fig.add_trace(
    go.Scatter(x=mergeTable['YEAR'], y=AnnualGHG['TotalEmission'], name="Total GHG Emission (kton)"
    ))
fig.update_layout(
    title_text="Fig. 8.1 Average Temperature and Total GHG Emission in Canada over years"
)
fig.update_xaxes(title_text="Year", type='category')
fig.update_yaxes(title_text="Total GHG Emission (kton)", secondary_y=False)
fig.update_yaxes(title_text="Average Temperature (degC)", secondary_y=True)
fig.show()

mergeTableState = pd.merge(AnnualGHGState, seasonTempState, on=['YEAR', 'State'])
#display(mergeTableState)

test = mergeTableState.query("Season == 'summer'")
fig = px.scatter(test, x="AvgTemp", y="TotalEmission", color="State", size = "TotalEmission", hover_data=None, title = "Fig. 8.2 Average Temperature of Summer and Total GHG Emission by Procince")
fig.show()

Conclusion

We succeed to address the guiding question with Python functions: plotly express, Numpy through data wrangling and visualizations.

  • nan values were deleted from the datasets. Outliers were checked for all data columns and removed accordingly.
  • City data had better correlation with corresponding state and country than with global data
  • If there higher activity in a major city with more GHG emissions, its effect is transmitting to related state and country but not so strogly globally. (Another data trend between GHG emission and temperatures does not show any correlation between the two)
  • There was no distinct correlation between the emitters of greenhouse gases and green countries. One common conclusion from the global, Canada and Countries analysis is that Land Surface Temperature has been increasing for over a century and continues to do so. Albeit, at different paces. This is one of the key evidence of global warming.
  • Overall, Greenhouse Gas (GHG) Emission in Canada has been decreased up to 8% at the Year 2009 from 2005 but it has continuously increased.
  • Confirmed gradual increase of GHG emission over years for the provinces whose main business sector is oil and gas industry.
  • On the other hand, the Province such as Ontario has experienced the steady reduction of GHG over years.
  • It's hard to see clear correlation between Average surface temperature and Total GHG Emission in Canada based on the local data set in this study that is collected only in Canada during short term.